RIOT: I/O-Efficient Numerical Computing without SQL

نویسندگان

  • Yi Zhang
  • Herodotos Herodotou
  • Jun Yang
چکیده

R is a numerical computing environment that is widely popular for statistical data analysis. Like many such environments, R performs poorly for large datasets whose sizes exceed that of physical memory. We present our vision of RIOT (R with I/O Transparency), a system that makes R programs I/O-efficient in a way transparent to the users. We describe our experience with RIOT-DB, an initial prototype that uses a relational database system as a backend. Despite the overhead and inadequacy of generic database systems in handling array data and numerical computation, RIOT-DB significantly outperforms R in many large-data scenarios, thanks to a suite of high-level, inter-operation optimizations that integrate seamlessly into R. While many techniques in RIOT are inspired by databases (and, for RIOT-DB, realized by a database system), RIOT users are insulated from anything database related. Compared with previous approaches that require users to learn new languages and rewrite their programs to interface with a database, RIOT will, we believe, be easier to adopt by the majority of the R users.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

RIOT – A Parallel Input/Output Tracer

Input/output (I/O) operations can represent a significant proportion of the run-time when large scientific applications are run in parallel. Although there have been advances in the form of file-format libraries, file-system design and I/O hardware, a growing divergence exists between the performance of parallel file-systems and compute processing rates. The effect is often a bottleneck when an...

متن کامل

Light-Weight Parallel I/O Analysis at Scale

Input/output (I/O) operations can represent a significant proportion of the run-time when large scientific applications are run in parallel. Although there have been advances in the form of file-format libraries, file system design and I/O hardware, a growing divergence exists between the performance of parallel file systems and compute processing rates. In this paper we utilise RIOT, an input/...

متن کامل

Automating Layout of Relational Databases

The choice of database layout, i.e., how database objects such as tables and indexes are assigned to disk drives can significantly impact the I/O performance of the system. Today, DBAs typically rely on fully striping objects across all available disk drives as the basic mechanism for optimizing I/O performance. While full striping maximizes I/O parallelism, when query execution involves co-acc...

متن کامل

A Specific Encryption Solution for Data Warehouses

Protecting Data Warehouses (DWs) is critical, because they store the secrets of the business. Although published research and best practice guides state encryption is the best way to assure the confidentiality of sensitive data and maintain high performance, this adds overheads that jeopardize their feasibility in DWs. In this paper, we propose a Specific Encryption Solution tailored for DWs (S...

متن کامل

Riot control agents: pharmacology, toxicology, biochemistry and chemistry.

The desired effect of all riot control agents is the temporary disablement of individuals by way of intense irritation of the mucous membranes and skin. Generally, riot control agents can produce acute site-specific toxicity where sensory irritation occurs. Early riot control agents, namely, chloroacetophenone (CN) and chlorodihydrophenarsazine (DM), have been replaced with 'safer' agents such ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/0909.1766  شماره 

صفحات  -

تاریخ انتشار 2009